535. Encode and Decode TinyURL
Medium
Note: This is a companion problem to the System Design problem: Design TinyURL.

TinyURL is a URL shortening service where you enter a URL such as https://leetcode.com/problems/design-tinyurl and it returns a short URL such as http://tinyurl.com/4e9iAk. Design a class to encode a URL and decode a tiny URL.

There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.

Implement the Solution class:

  • Solution() Initializes the object of the system.
  • String encode(String longUrl) Returns a tiny URL for the given longUrl.
  • String decode(String shortUrl) Returns the original long URL for the given shortUrl. It is guaranteed that the given shortUrl was encoded by the same object.

 

Example 1:

Input: url = "https://leetcode.com/problems/design-tinyurl"
Output: "https://leetcode.com/problems/design-tinyurl"

Explanation:
Solution obj = new Solution();
string tiny = obj.encode(url); // returns the encoded tiny url.
string ans = obj.decode(tiny); // returns the original url after deconding it.

 

Constraints:

  • 1 <= url.length <= 104
  • url is guranteed to be a valid URL.
Accepted
146.9K
Submissions
177.8K
Seen this question in a real interview before?
Companies
Related Topics

Average Rating: 4.88 (61 votes)

Premium

Solution


Approach #1 Using Simple Counter[Accepted]

In order to encode the URL, we make use of a counter(ii), which is incremented for every new URL encountered. We put the URL along with its encoded count(ii) in a HashMap. This way we can retrieve it later at the time of decoding easily.

Performance Analysis

  • The range of URLs that can be decoded is limited by the range of int\text{int}.

  • If excessively large number of URLs have to be encoded, after the range of int\text{int} is exceeded, integer overflow could lead to overwriting the previous URLs' encodings, leading to the performance degradation.

  • The length of the URL isn't necessarily shorter than the incoming longURL\text{longURL}. It is only dependent on the relative order in which the URLs are encoded.

  • One problem with this method is that it is very easy to predict the next code generated, since the pattern can be detected by generating a few encoded URLs.


Approach #2 Variable-length Encoding[Accepted]

Algorithm

In this case, we make use of variable length encoding to encode the given URLs. For every longURL\text{longURL}, we choose a variable codelength for the input URL, which can be any length between 0 and 61. Further, instead of using only numbers as the Base System for encoding the URLSs, we make use of a set of integers and alphabets to be used for encoding.

Performance Analysis

  • The number of URLs that can be encoded is, again, dependent on the range of int\text{int}, since, the same countcount will be generated after overflow of integers.

  • The length of the encoded URLs isn't necessarily short, but is to some extent dependent on the order in which the incoming longURL\text{longURL}'s are encountered. For example, the codes generated will have the lengths in the following order: 1(62 times), 2(62 times) and so on.

  • The performance is quite good, since the same code will be repeated only after the integer overflow limit, which is quite large.

  • In this case also, the next code generated could be predicted by the use of some calculations.


Approach #3 Using hashcode[Accepted]

Algorithm

In this method, we make use of an inbuilt function hashCode()\text{hashCode()} to determine a code for mapping every URL. Again, the mapping is stored in a HashMap for decoding.

The hash code for a String object is computed(using int arithmetic) as −

s[0]31(n1)+s[1]31(n2)+...+s[n1]s[0]*31^{(n - 1)} + s[1]*31^{(n - 2)} + ... + s[n - 1] , where s[i] is the ith character of the string, n is the length of the string.

Performance Analysis

  • The number of URLs that can be encoded is limited by the range of int\text{int}, since hashCode\text{hashCode} uses integer calculations.

  • The average length of the encoded URL isn't directly related to the incoming longURL\text{longURL} length.

  • The hashCode()\text{hashCode()} doesn't generate unique codes for different string. This property of getting the same code for two different inputs is called collision. Thus, as the number of encoded URLs increases, the probability of collisions increases, which leads to failure.

  • The following figure demonstrates the mapping of different objects to the same hashcode and the increasing probability of collisions with increasing number of objects.

Encode_and_Decode_URLs

  • Thus, it isn't necessary that the collisions start occuring only after a certain number of strings have been encoded, but they could occur way before the limit is even near to the int\text{int}. This is similar to birthday paradox i.e. the probability of two people having the same birthday is nearly 50% if we consider only 23 people and 99.9% with just 70 people.

  • Predicting the encoded URL isn't easy in this scheme.


Approach #4 Using random number[Accepted]

Algorithm

In this case, we generate a random integer to be used as the code. In case the generated code happens to be already mapped to some previous longURL\text{longURL}, we generate a new random integer to be used as the code. The data is again stored in a HashMap to help in the decoding process.

**Performance Analysis**
  • The number of URLs that can be encoded is limited by the range of int\text{int}.

  • The average length of the codes generated is independent of the longURL\text{longURL}'s length, since a random integer is used.

  • The length of the URL isn't necessarily shorter than the incoming longURL\text{longURL}. It is only dependent on the relative order in which the URLs are encoded.

  • Since a random number is used for coding, again, as in the previous case, the number of collisions could increase with the increasing number of input strings, leading to performance degradation.

  • Determining the encoded URL isn't possible in this scheme, since we make use of random numbers.


Approach #5 Random fixed-length encoding[Accepted]

Algorithm

In this case, again, we make use of the set of numbers and alphabets to generate the coding for the given URLs, similar to Approach 2. But in this case, the length of the code is fixed to 6 only. Further, random characters from the string to form the characters of the code. In case, the code generated collides with some previously generated code, we form a new random code.

Performance Analysis

  • The number of URLs that can be encoded is quite large in this case, nearly of the order (10+262)6(10+26*2)^6.

  • The length of the encoded URLs is fixed to 6 units, which is a significant reduction for very large URLs.

  • The performance of this scheme is quite good, due to a very less probability of repeated same codes generated.

  • We can increase the number of encodings possible as well, by increasing the length of the encoded strings. Thus, there exists a tradeoff between the length of the code and the number of encodings possible.

  • Predicting the encoding isn't possible in this scheme since random numbers are used.

Report Article Issue

Comments: 21
thepatelguy's avatar
Read More

Approach 2, how come there is count++ after the return statement in encode() and it is an accepted answer?

47
Show 2 replies
Reply
Share
Report
juanqiuxiaoxiao's avatar
Read More

In Method 1, why don't we start from i = Integer.MIN_VALUE?

15
Show 5 replies
Reply
Share
Report
runningsnail's avatar
Read More

Do we need to de-dupe the long url? Can one same long url have multiple tinyUrls?

8
Show 1 reply
Reply
Share
Report
ghudeihed's avatar
Read More

I used a Guid for the key and answer was accepted

public class Codec {
    private Dictionary<string,string> hashTable = new Dictionary<string,string>();
        
    // Encodes a URL to a shortened URL
    public string encode(string longUrl) {
        string hashKey = new Guid().ToString().Replace("-", "");
        hashTable[hashKey] = longUrl;
        return "http://tinyurl.com/" + hashKey;
    }

    // Decodes a shortened URL to its original URL.
    public string decode(string shortUrl) {
        string hashKey = shortUrl.Replace("http://tinyurl.com/", "");
        return hashTable[hashKey];
    }
}   

6
Reply
Share
Report
fuyaoli's avatar
Read More

the Approach#2 doesn't work actually, what's wrong with it ?

3
Show 2 replies
Reply
Share
Report
ricace's avatar
Read More

really nice solutions. Thanks the author

2
Reply
Share
Report
smillar09's avatar
Read More

@workSmart the number of URLs that can be encoded with unique keys is the same as as number of possible outcomes of the 6 character string, which is 10 (for the numbers) + 26*2 (for the upper and lower case letters) ^ 6 (since the string is 6 characters long). If for example the string was 10 characters long, and there were no upper case letters in the defined alphabet, then the number of URLs would be 10 (for the numbers) + 26 (lower case letters only) ^ 10 (since the string is 10 characters long). Hope this helps!

1
Reply
Share
Report
stella_'s avatar
Read More

for Approach #2 Variable-length Encoding, it s not 1:1 mapping. there will have duplicate converted tiny url since every 62 there '!' == '_' (33+62=95). should there also need a hashmap storing these duplicates ?

1
Show 1 reply
Reply
Share
Report
Govind93's avatar
Read More

Great Step by Step Solution

0
Reply
Share
Report
softwareshortcut's avatar
Read More

Is there a little mistake in Approach 5? We want to generate a random number from 0 to 61 and not 0 to 62.
alphabet.charAt(62) will break with StringIndexOutOfBoundsException. Unless nextInt() function excludes the max.

0
Show 2 replies
Reply
Share
Report
Time Submitted
Status
Runtime
Memory
Language
03/16/2021 13:57Accepted0 ms7.5 MBcpp
03/16/2021 13:56Accepted8 ms7.5 MBcpp
/23
Your previous code was restored from your local storage.Reset to default
Contribute
|||
Saved
Trie
#208 Implement Trie (Prefix Tree)
Medium
#211 Design Add and Search Words Data Structure
Medium
#212 Word Search II
Hard
#336 Palindrome Pairs
Hard
#421 Maximum XOR of Two Numbers in an Array
Medium
#425 Word Squares
Hard
#472 Concatenated Words
Hard
#642 Design Search Autocomplete System
Hard
#648 Replace Words
Medium
#676 Implement Magic Dictionary
Medium
#677 Map Sum Pairs
Medium
#692 Top K Frequent Words
Medium
#720 Longest Word in Dictionary
Easy
#745 Prefix and Suffix Search
Hard
#1023 Camelcase Matching
Medium
#1032 Stream of Characters
Hard
#1065 Index Pairs of a String
Easy
#1638 Count Substrings That Differ by One Character
Medium
#1698 Number of Distinct Substrings in a String
Medium
#1707 Maximum XOR With an Element From Array
Hard
#1803 Count Pairs With XOR in a Range
Hard
#1804 Implement Trie II (Prefix Tree)
Medium
#1858 Longest Word With All Prefixes
Medium